Cutting Through the Noise: Defining Ground Truth in Information Credibility on Twitter

نویسندگان

  • Sujoy Sikdar
  • Byungkyu Kang
چکیده

Increased popularity of microblogs in recent years brings about a need for better mechanisms to extract credible or otherwise useful information from noisy and large data. While there are a great number of studies that introduce methods to find credible data, there is no accepted credibility benchmark. As a result, it is hard to compare different studies and generalize from their findings. In this paper, we argue for a methodology for making such studies more useful to the research community. First, the underlying ground truth values of credibility must be reliable. The specific constructs used to define credibility must be carefully identified. Secondly, the underlying network context must be quantified and documented. To illustrate these two points, we conduct a unique credibility study of two different data sets on the same topic, but with different network characteristics. We also conduct two different user surveys, and construct two additional indicators of credibility based on retweet behavior. Through a detailed statistical study, we first show that survey based methods can be extremely noisy and results may vary greatly from survey to survey. However, by combining such methods with retweet behavior, we can incorporate two signals that are noisy but uncorrelated, resulting in ground truth measures that can be predicted with high accuracy and are stable across different data sets and survey methods. Newsworthiness of tweets can be a useful frame for specific applications, but it is not necessary for achieving reliable credibility ground truth measurements. We also show that the underlying model for predicting credibility can differ depending on the underlying network context, which needs to be clearly identified and reported in credibility studies to improve their impact.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Refining Twitter Lists as Ground Truth Data for Multi-community User Classification

To help scholars and businesses understand and analyse Twitter users, it is useful to have classifiers that can identify the communities that a given user belongs to, e.g. business or politics. Obtaining high quality training data is an important step towards producing an effective multi-community classifier. An efficient approach for creating such ground truth data is to extract users from exi...

متن کامل

Event Detection via Communication Pattern Analysis

Social media applications such as Twitter provide a powerful medium through which users can communicate their observations with friends and with the world at large. We have witnessed live reporting of many events, from soccer games in Johannesburg to revolutions in Cairo and Tunis, and these reports have in many ways rivaled the content provided by the official media. Tapping into this valuable...

متن کامل

A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response

Since its inception, Twitter has played a major role in real-world events—especially in the aftermath of disasters and catastrophic incidents, and has been increasingly becoming the first point of contact for users wishing to provide or seek information about such situations. The use of Twitter in emergency response and disaster management opens up avenues of research concerning different aspec...

متن کامل

Probabilistic Inference of Twitter Users' Age Based on What They Follow

Twitter provides an open and rich source of data for studying human behaviour at scale and is widely used in social and network sciences. However, a major criticism of Twitter data is that demographic information is largely absent. Enhancing Twitter data with user ages would advance our ability to study social network structures, information flows and the spread of contagions. Approaches toward...

متن کامل

Limits of use of social media for monitoring biosecurity events

Compared to applications that trigger massive information streams, like earthquakes and human disease epidemics, the data input for agricultural and environmental biosecurity events (ie. the introduction of unwanted exotic pests and pathogens), is expected to be sparse and less frequent. To investigate if Twitter data can be useful for the detection and monitoring of biosecurity events, we adop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013